{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# **MLFlow: Unified Platform for Experiment Tracking and Model Registry**\n",
    "\n",
    "MLflow is an open-source platform for managing the end-to-end machine learning lifecycle. It provides a suite of tools and components designed to streamline the development, experimentation, productionization, and collaboration aspects of machine learning projects. MLflow is widely used by data scientists, machine learning engineers, and researchers to track experiments, package and share code, and deploy models at scale.\n",
    "\n",
    "## **Key Features:**\n",
    "1. Experiment Tracking\n",
    "2. Model Registry\n",
    "\n",
    "<img src=\"img/tracking_experiments.PNG\">\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## **Introduction to Experiment Tracking**\n",
    "\n",
    "#### **Why Track?**\n",
    "1. Organization  \n",
    "2. Optimization  \n",
    "3. Reproducibility  \n",
    "\n",
    "\n",
    "#### **What do you want to track for each Experiment Run?**\n",
    "1. Training and Validation Data Used\n",
    "2. Hyperparameters\n",
    "3. Metrics\n",
    "4. Models\n",
    "5. Training Time\n",
    "6. Model Size\n",
    "\n",
    "\n",
    "#### **Tool - MLFlow**  \n",
    "MLFlow helps you to organize your experiments into runs.\n",
    "\n",
    "\n",
    "#### **Terminologies:**\n",
    "1. Experiment  \n",
    "2. Run  \n",
    "3. Metadata  (i.e. Tags, Parameters, Metrics)  \n",
    "4. Artifacts (i.e. Output files associated with experiment runs)\n",
    "\n",
    "\n",
    "#### **MLFlow keeps track of:**\n",
    "> Tags  \n",
    "> Parameters  \n",
    "> Metrics  \n",
    "> Models  \n",
    "> Artifact  \n",
    "> Source code, Start and End Time, Authors etc..\n",
    "\n",
    "\n",
    "**Run below mentioned commands to install mlflow on your system:**  \n",
    "```\n",
    "pip install mlflow\n",
    "```\n",
    "\n",
    "**Run the following commnd to open the MLFlow Dashboard**\n",
    "```\n",
    "mlflow ui\n",
    "mlflow ui --backend-store-uri sqlite:///mlflow.db\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## **Loading the Data**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "iris = pd.read_csv('data/iris.csv')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Id</th>\n",
       "      <th>SepalLengthCm</th>\n",
       "      <th>SepalWidthCm</th>\n",
       "      <th>PetalLengthCm</th>\n",
       "      <th>PetalWidthCm</th>\n",
       "      <th>Species</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>5.1</td>\n",
       "      <td>3.5</td>\n",
       "      <td>1.4</td>\n",
       "      <td>0.2</td>\n",
       "      <td>Iris-setosa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>4.9</td>\n",
       "      <td>3.0</td>\n",
       "      <td>1.4</td>\n",
       "      <td>0.2</td>\n",
       "      <td>Iris-setosa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>4.7</td>\n",
       "      <td>3.2</td>\n",
       "      <td>1.3</td>\n",
       "      <td>0.2</td>\n",
       "      <td>Iris-setosa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>4.6</td>\n",
       "      <td>3.1</td>\n",
       "      <td>1.5</td>\n",
       "      <td>0.2</td>\n",
       "      <td>Iris-setosa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>5.0</td>\n",
       "      <td>3.6</td>\n",
       "      <td>1.4</td>\n",
       "      <td>0.2</td>\n",
       "      <td>Iris-setosa</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Id  SepalLengthCm  SepalWidthCm  PetalLengthCm  PetalWidthCm      Species\n",
       "0   1            5.1           3.5            1.4           0.2  Iris-setosa\n",
       "1   2            4.9           3.0            1.4           0.2  Iris-setosa\n",
       "2   3            4.7           3.2            1.3           0.2  Iris-setosa\n",
       "3   4            4.6           3.1            1.5           0.2  Iris-setosa\n",
       "4   5            5.0           3.6            1.4           0.2  Iris-setosa"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "iris.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(150, 6)\n"
     ]
    }
   ],
   "source": [
    "print(iris.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## **Running the Experiment**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.model_selection import train_test_split\n",
    "\n",
    "from sklearn.preprocessing import StandardScaler, MinMaxScaler\n",
    "\n",
    "from sklearn.neighbors import KNeighborsClassifier\n",
    "from sklearn.svm import SVC\n",
    "from sklearn.linear_model import LogisticRegression\n",
    "from sklearn.ensemble import RandomForestClassifier\n",
    "from sklearn.tree import DecisionTreeClassifier\n",
    "from sklearn.naive_bayes import GaussianNB\n",
    "\n",
    "from sklearn.model_selection import GridSearchCV\n",
    "\n",
    "from sklearn.pipeline import Pipeline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "import warnings\n",
    "\n",
    "warnings.filterwarnings('ignore')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "X = iris[['SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm']]\n",
    "\n",
    "y = iris['Species']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(112, 4) (38, 4)\n"
     ]
    }
   ],
   "source": [
    "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)\n",
    "\n",
    "print(X_train.shape, X_test.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## **Auto Logging KNN Experiment Run using MLFlow**\n",
    "\n",
    "**Step 1 - Import MLFlow and set the experiment name**\n",
    "```python\n",
    "import mlflow\n",
    "\n",
    "mlflow.set_experiment(\"EXPERIMENT_NAME\")\n",
    "```\n",
    "\n",
    "**Step 2 - Start the auto logger**\n",
    "```python\n",
    "mlflow.sklearn.autolog()\n",
    "\n",
    "# Initialize the auto logger\n",
    "# max_tuning_runs=None will make sure that all the runs are recorded.\n",
    "# By default top 5 runs will be recorded for each experiment\n",
    "```\n",
    "**Step 3 - Start the experiment run**\n",
    "```python\n",
    "with mlflow.start_run() as run:\n",
    "    clf.fit(X_train, y_train)\n",
    "```\n",
    "\n",
    "\n",
    "\n",
    "<img src=\"img/tracking_experiments_hyperparameters.JPG\">"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<Experiment: artifact_location='file:///C:/Users/DELL/Desktop/github/bansalkanav/Machine_Learning_and_Deep_Learning/Module%205%20-%20MLOPs/3.%20Experiment%20Tracking%20and%20Model%20Management/mlruns/315136114113215422', creation_time=1710940361468, experiment_id='315136114113215422', last_update_time=1710940361468, lifecycle_stage='active', name='iris_species_prediction', tags={}>"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import mlflow\n",
    "\n",
    "mlflow.set_experiment(\"iris_species_prediction\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Define pipeline steps\n",
    "pipe_1 = Pipeline(\n",
    "    [\n",
    "        ('scaler', StandardScaler()),\n",
    "        ('classifier', KNeighborsClassifier())\n",
    "    ]\n",
    ")\n",
    "\n",
    "\n",
    "# Observe the Key Value Pair format\n",
    "parameter_grid_1 = [\n",
    "    {\n",
    "        'scaler': [StandardScaler(), MinMaxScaler()],\n",
    "        'classifier__n_neighbors' : [i for i in range(3, 21, 2)],              \n",
    "        'classifier__p' : [1, 2, 3]\n",
    "    }\n",
    "]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2024/03/20 18:47:40 WARNING mlflow.utils.git_utils: Failed to import Git (the Git executable is probably not on your PATH), so Git SHA is not available. Error: Failed to initialize: Bad git executable.\n",
      "The git executable must be specified in one of the following ways:\n",
      "    - be included in your $PATH\n",
      "    - be set via $GIT_PYTHON_GIT_EXECUTABLE\n",
      "    - explicitly set via git.refresh()\n",
      "\n",
      "All git commands will error until this is rectified.\n",
      "\n",
      "This initial message can be silenced or aggravated in the future by setting the\n",
      "$GIT_PYTHON_REFRESH environment variable. Use one of the following values:\n",
      "    - quiet|q|silence|s|silent|none|n|0: for no message or exception\n",
      "    - warn|w|warning|log|l|1: for a warning message (logged at level CRITICAL, displayed by default)\n",
      "    - error|e|exception|raise|r|2: for a raised exception\n",
      "\n",
      "Example:\n",
      "    export GIT_PYTHON_REFRESH=quiet\n",
      "\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Fitting 5 folds for each of 54 candidates, totalling 270 fits\n",
      "CPU times: total: 31 s\n",
      "Wall time: 40.8 s\n"
     ]
    }
   ],
   "source": [
    "clf = GridSearchCV(\n",
    "    estimator=pipe_1, \n",
    "    param_grid=parameter_grid_1, \n",
    "    scoring='accuracy',\n",
    "    cv=5,\n",
    "    return_train_score=True,\n",
    "    verbose=1\n",
    ")\n",
    "\n",
    "# Initialize the auto logger\n",
    "# max_tuning_runs=None will make sure that all the runs are recorded.\n",
    "# By default top 5 runs will be recorded for each experiment\n",
    "mlflow.sklearn.autolog(max_tuning_runs=None)\n",
    "\n",
    "with mlflow.start_run() as run:\n",
    "    %time clf.fit(X_train, y_train)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## **Auto Logging SVM Experiment Run using MLFlow**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "pipe_2 = Pipeline(\n",
    "    [\n",
    "        ('scaler', StandardScaler()),\n",
    "        ('classifier', SVC())\n",
    "    ]\n",
    ")\n",
    "\n",
    "\n",
    "# Observe the Key Value Pair format\n",
    "parameter_grid_2 = [\n",
    "    {\n",
    "        'scaler': [StandardScaler(), MinMaxScaler()],\n",
    "        'classifier__kernel' : ['rbf'], \n",
    "        'classifier__C' : [0.1, 0.01, 1, 10, 100]\n",
    "    }, \n",
    "    {\n",
    "        'scaler': [StandardScaler(), MinMaxScaler()],\n",
    "        'classifier__kernel' : ['poly'], \n",
    "        'classifier__degree' : [2, 3, 4, 5], \n",
    "        'classifier__C' : [0.1, 0.01, 1, 10, 100]\n",
    "    }, \n",
    "    {\n",
    "        'scaler': [StandardScaler(), MinMaxScaler()],\n",
    "        'classifier__kernel' : ['linear'], \n",
    "        'classifier__C' : [0.1, 0.01, 1, 10, 100]\n",
    "    }\n",
    "]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Fitting 5 folds for each of 60 candidates, totalling 300 fits\n",
      "CPU times: total: 22.3 s\n",
      "Wall time: 31.5 s\n"
     ]
    }
   ],
   "source": [
    "clf = GridSearchCV(\n",
    "    estimator=pipe_2, \n",
    "    param_grid=parameter_grid_2, \n",
    "    scoring='accuracy',\n",
    "    cv=5,\n",
    "    return_train_score=True,\n",
    "    verbose=1\n",
    ")\n",
    "\n",
    "# Initialize the auto logger\n",
    "# max_tuning_runs=None will make sure that all the runs are recorded.\n",
    "# By default top 5 runs will be recorded for each experiment\n",
    "mlflow.sklearn.autolog(max_tuning_runs=None)\n",
    "\n",
    "with mlflow.start_run() as run:\n",
    "    %time clf.fit(X_train, y_train)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## **Auto Logging All Experiment Runs using MLFlow**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "pipelines = {\n",
    "    'knn' : Pipeline([\n",
    "        ('scaler', StandardScaler()),\n",
    "        ('classifier', KNeighborsClassifier())\n",
    "    ]), \n",
    "    'svc' : Pipeline([\n",
    "        ('scaler', StandardScaler()),\n",
    "        ('classifier', SVC())\n",
    "    ]),\n",
    "    'logistic_regression': Pipeline([\n",
    "        ('scaler', StandardScaler()),\n",
    "        ('classifier', LogisticRegression())\n",
    "    ]),\n",
    "    'random_forest': Pipeline([\n",
    "        ('scaler', StandardScaler()),\n",
    "        ('classifier', RandomForestClassifier())\n",
    "    ]),\n",
    "    'decision_tree': Pipeline([\n",
    "        ('scaler', StandardScaler()),\n",
    "        ('classifier', DecisionTreeClassifier())\n",
    "    ]),\n",
    "    'naive_bayes': Pipeline([\n",
    "        ('scaler', StandardScaler()),\n",
    "        ('classifier', GaussianNB())\n",
    "    ])\n",
    "}\n",
    "\n",
    "# Define parameter grid for each algorithm\n",
    "param_grids = {\n",
    "    'knn': [\n",
    "        {\n",
    "            'scaler': [StandardScaler(), MinMaxScaler()],\n",
    "            'classifier__n_neighbors' : [i for i in range(3, 21, 2)], \n",
    "            'classifier__p' : [1, 2, 3]\n",
    "        }\n",
    "    ],\n",
    "    'svc': [\n",
    "        {\n",
    "            'scaler': [StandardScaler(), MinMaxScaler()],\n",
    "            'classifier__kernel' : ['rbf'], \n",
    "            'classifier__C' : [0.1, 0.01, 1, 10, 100]\n",
    "        }, \n",
    "        {\n",
    "            'scaler': [StandardScaler(), MinMaxScaler()],\n",
    "            'classifier__kernel' : ['poly'], \n",
    "            'classifier__degree' : [2, 3, 4, 5], \n",
    "            'classifier__C' : [0.1, 0.01, 1, 10, 100]\n",
    "        }, \n",
    "        {\n",
    "            'scaler': [StandardScaler(), MinMaxScaler()],\n",
    "            'classifier__kernel' : ['linear'], \n",
    "            'classifier__C' : [0.1, 0.01, 1, 10, 100]\n",
    "        }\n",
    "    ],\n",
    "    'logistic_regression': [\n",
    "        {\n",
    "            'scaler': [StandardScaler(), MinMaxScaler()],\n",
    "            'classifier__C': [0.1, 1, 10], \n",
    "            'classifier__penalty': ['l2']\n",
    "        }, \n",
    "        {\n",
    "            'scaler': [StandardScaler(), MinMaxScaler()],\n",
    "            'classifier__C': [0.1, 1, 10], \n",
    "            'classifier__penalty': ['l1'], \n",
    "            'classifier__solver': ['liblinear']\n",
    "        }, \n",
    "        {\n",
    "            'scaler': [StandardScaler(), MinMaxScaler()],\n",
    "            'classifier__C': [0.1, 1, 10], \n",
    "            'classifier__penalty': ['elasticnet'], \n",
    "            'classifier__l1_ratio': [0.4, 0.5, 0.6],\n",
    "            'classifier__solver': ['saga']\n",
    "        }\n",
    "    ],\n",
    "    'random_forest': [\n",
    "        {\n",
    "            'scaler': [StandardScaler(), MinMaxScaler()],\n",
    "            'classifier__n_estimators': [50, 100, 200]\n",
    "        }\n",
    "    ],\n",
    "    'decision_tree': [\n",
    "        {\n",
    "            'scaler': [StandardScaler(), MinMaxScaler()],\n",
    "            'classifier__max_depth': [None, 5, 10]\n",
    "        }\n",
    "    ],\n",
    "    'naive_bayes': [\n",
    "        {\n",
    "            'scaler': [StandardScaler(), MinMaxScaler()]\n",
    "        }\n",
    "    ]\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "********** knn **********\n",
      "Fitting 5 folds for each of 54 candidates, totalling 270 fits\n",
      "CPU times: total: 27 s\n",
      "Wall time: 36.8 s\n",
      "Train Score:  0.9644268774703558\n",
      "Test Score:  0.9736842105263158\n",
      "\n",
      "********** svc **********\n",
      "Fitting 5 folds for each of 60 candidates, totalling 300 fits\n",
      "CPU times: total: 24 s\n",
      "Wall time: 26.6 s\n",
      "Train Score:  0.9644268774703558\n",
      "Test Score:  0.9736842105263158\n",
      "\n",
      "********** logistic_regression **********\n",
      "Fitting 5 folds for each of 30 candidates, totalling 150 fits\n",
      "CPU times: total: 14.6 s\n",
      "Wall time: 23.2 s\n",
      "Train Score:  0.9640316205533598\n",
      "Test Score:  0.9736842105263158\n",
      "\n",
      "********** random_forest **********\n",
      "Fitting 5 folds for each of 6 candidates, totalling 30 fits\n",
      "CPU times: total: 24.9 s\n",
      "Wall time: 36.5 s\n",
      "Train Score:  0.9553359683794467\n",
      "Test Score:  0.9736842105263158\n",
      "\n",
      "********** decision_tree **********\n",
      "Fitting 5 folds for each of 6 candidates, totalling 30 fits\n",
      "CPU times: total: 3.66 s\n",
      "Wall time: 15.2 s\n",
      "Train Score:  0.9640316205533598\n",
      "Test Score:  0.9736842105263158\n",
      "\n",
      "********** naive_bayes **********\n",
      "Fitting 5 folds for each of 2 candidates, totalling 10 fits\n",
      "CPU times: total: 1.97 s\n",
      "Wall time: 15.2 s\n",
      "Train Score:  0.9557312252964426\n",
      "Test Score:  1.0\n",
      "\n"
     ]
    }
   ],
   "source": [
    "best_models = {}\n",
    "\n",
    "# Run the Pipeline\n",
    "for algo in pipelines.keys():\n",
    "    print(\"*\"*10, algo, \"*\"*10)\n",
    "    grid_search = GridSearchCV(estimator=pipelines[algo], \n",
    "                               param_grid=param_grids[algo], \n",
    "                               cv=5, \n",
    "                               scoring='accuracy', \n",
    "                               return_train_score=True,\n",
    "                               verbose=1\n",
    "                              )\n",
    "    \n",
    "    mlflow.sklearn.autolog(max_tuning_runs=None)\n",
    "    \n",
    "    with mlflow.start_run() as run:\n",
    "        %time grid_search.fit(X_train, y_train)\n",
    "        \n",
    "    print('Train Score: ', grid_search.best_score_)\n",
    "    print('Test Score: ', grid_search.score(X_test, y_test))\n",
    "    \n",
    "    best_models[algo] = grid_search.best_estimator_\n",
    "    print()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Stop the auto logger\n",
    "\n",
    "mlflow.sklearn.autolog(disable=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## **Custom Experiment Tracking and Database Integration with MLFlow**\n",
    "\n",
    "**Step 1 - Import MLFlow**\n",
    "```python\n",
    "import mlflow\n",
    "```\n",
    "\n",
    "**Step 2 - Set the tracker and experiment**\n",
    "```python\n",
    "mlflow.set_tracking_uri(DATABASE_URI)\n",
    "mlflow.set_experiment(\"EXPERIMENT_NAME\")\n",
    "```\n",
    "\n",
    "**Step 3 - Start a experiment run**\n",
    "```python\n",
    "with mlflow.start_run():\n",
    "```\n",
    "\n",
    "**Step 4 - Logging the metadata**\n",
    "```python\n",
    "mlflow.set_tag(KEY, VALUE) \n",
    "mlflow.log_param(KEY, VALUE) \n",
    "mlflow.log_metric(KEY, VALUE)\n",
    "```\n",
    "\n",
    "**Step 5 - Logging the model and other files (2 ways)**  \n",
    "Way 1 -\n",
    "```python\n",
    "mlflow.<FRAMEWORK>.log_model(MODEL_OBJECT, artifact_path=\"PATH\")\n",
    "```  \n",
    "Way 2 - \n",
    "```python\n",
    "mlflow.log_artifact(LOCAL_PATH, artifact_path=\"PATH\")\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "import time\n",
    "import joblib\n",
    "import os"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [],
   "source": [
    "# mlflow.set_tracking_uri(\"sqlite:///mlflow_1.db\")\n",
    "\n",
    "# mlflow.set_experiment(\"Iris Species Prediction\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "********** knn **********\n",
      "Fitting 5 folds for each of 54 candidates, totalling 270 fits\n",
      "Train Score:  0.9644268774703558\n",
      "Test Score:  0.9736842105263158\n",
      "Fit Time:  4.625600576400757\n",
      "Predict Time:  0.005024909973144531\n",
      "Model Size:  11965\n",
      "\n",
      "********** svc **********\n",
      "Fitting 5 folds for each of 60 candidates, totalling 300 fits\n",
      "Train Score:  0.9644268774703558\n",
      "Test Score:  0.9736842105263158\n",
      "Fit Time:  2.8878395557403564\n",
      "Predict Time:  0.000997781753540039\n",
      "Model Size:  5634\n",
      "\n",
      "********** logistic_regression **********\n",
      "Fitting 5 folds for each of 30 candidates, totalling 150 fits\n",
      "Train Score:  0.9640316205533598\n",
      "Test Score:  0.9736842105263158\n",
      "Fit Time:  1.7286550998687744\n",
      "Predict Time:  0.0019011497497558594\n",
      "Model Size:  2142\n",
      "\n",
      "********** random_forest **********\n",
      "Fitting 5 folds for each of 6 candidates, totalling 30 fits\n",
      "Train Score:  0.9553359683794467\n",
      "Test Score:  0.9736842105263158\n",
      "Fit Time:  6.280713081359863\n",
      "Predict Time:  0.00601959228515625\n",
      "Model Size:  84679\n",
      "\n",
      "********** decision_tree **********\n",
      "Fitting 5 folds for each of 6 candidates, totalling 30 fits\n",
      "Train Score:  0.9640316205533598\n",
      "Test Score:  0.9736842105263158\n",
      "Fit Time:  0.2682819366455078\n",
      "Predict Time:  0.001995086669921875\n",
      "Model Size:  3528\n",
      "\n",
      "********** naive_bayes **********\n",
      "Fitting 5 folds for each of 2 candidates, totalling 10 fits\n",
      "Train Score:  0.9557312252964426\n",
      "Test Score:  1.0\n",
      "Fit Time:  0.09075665473937988\n",
      "Predict Time:  0.001997232437133789\n",
      "Model Size:  2070\n",
      "\n"
     ]
    }
   ],
   "source": [
    "dev = \"Kanav Bansal\"\n",
    "best_models = {}\n",
    "\n",
    "for algo in pipelines.keys():\n",
    "    print(\"*\"*10, algo, \"*\"*10)\n",
    "    grid_search = GridSearchCV(estimator=pipelines[algo], \n",
    "                               param_grid=param_grids[algo], \n",
    "                               cv=5, \n",
    "                               scoring='accuracy', \n",
    "                               return_train_score=True,\n",
    "                               verbose=1\n",
    "                              )\n",
    "\n",
    "    # Fit\n",
    "    start_fit_time = time.time()\n",
    "    grid_search.fit(X_train, y_train)\n",
    "    end_fit_time = time.time()\n",
    "\n",
    "    # Predict\n",
    "    start_predict_time = time.time()\n",
    "    y_pred = grid_search.predict(X_test)\n",
    "    end_predict_time = time.time()\n",
    "\n",
    "    # Saving the best model\n",
    "    joblib.dump(grid_search.best_estimator_, f'best_models/{algo}.pkl')\n",
    "    model_size = os.path.getsize(f'best_models/{algo}.pkl')\n",
    "\n",
    "    # Pring Log\n",
    "    print('Train Score: ', grid_search.best_score_)\n",
    "    print('Test Score: ', grid_search.score(X_test, y_test))\n",
    "    print(\"Fit Time: \", end_fit_time - start_fit_time)\n",
    "    print(\"Predict Time: \", end_predict_time - start_predict_time)\n",
    "    print(\"Model Size: \", model_size)\n",
    "    \n",
    "    print()\n",
    "\n",
    "    # Start the experiment run\n",
    "    with mlflow.start_run() as run:\n",
    "        # Log tags with mlflow.set_tag()\n",
    "        mlflow.set_tag(\"developer\", dev)\n",
    "\n",
    "        # Log Parameters with mlflow.log_param()\n",
    "        mlflow.log_param(\"algorithm\", algo)\n",
    "        mlflow.log_param(\"hyperparameter_grid\", param_grids[algo])\n",
    "        mlflow.log_param(\"best_hyperparameter\", grid_search.best_params_)\n",
    "\n",
    "        # Log Metrics with mlflow.log_metric()\n",
    "        mlflow.log_metric(\"train_score\", grid_search.best_score_)\n",
    "        mlflow.log_metric(\"test_score\", grid_search.score(X_test, y_test))\n",
    "        mlflow.log_metric(\"fit_time\", end_fit_time - start_fit_time)\n",
    "        mlflow.log_metric(\"predict_time\", end_predict_time - start_predict_time)\n",
    "        mlflow.log_metric(\"model_size\", model_size)\n",
    "\n",
    "        # Log Model using mlflow.sklearn.log_model()\n",
    "        mlflow.sklearn.log_model(grid_search.best_estimator_, f\"{algo}_model\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## **Introduction to Model Registry**\n",
    "\n",
    "Model Registry provides functionality for managing and versioning machine learning models and their associated metadata. It allows data scientists and machine learning engineers to track, share, and collaborate on models throughout their lifecycle, from experimentation to production deployment.\n",
    "\n",
    "Key Features:\n",
    "1. Model Registration\n",
    "2. Model Versioning\n",
    "3. Stage Transitions\n",
    "4. Intra Team Collaboration\n",
    "\n",
    "#### **Model Versioning**  \n",
    "In the context of software development and deployment, \"Archived,\" \"Staged,\" and \"Production\" are typically used to denote different stages or environments in the software lifecycle. These tags help developers and teams manage the lifecycle of software releases and ensure smooth transitions between different stages of development and deployment. Here's a brief explanation of each:\n",
    "\n",
    "1. **Archived**: These versions are no longer in active use.\n",
    "   - This tag is usually associated with versions of software or code that have been retired or deprecated. \n",
    "   - Archived versions are no longer actively maintained or used in production environments.\n",
    "   - Archived versions may be kept for historical reference or auditing purposes but are not intended for active use.\n",
    "\n",
    "2. **Staged**: These versions are ready for deployment pending final validation.\n",
    "   - The \"Staged\" tag is often used to represent versions of software or code that have undergone testing and are ready for deployment to a production environment.\n",
    "   - Staged versions have typically passed through development, testing, and quality assurance stages and are considered stable and reliable.\n",
    "   - In some development workflows, staged versions may be deployed to a pre-production or staging environment for final validation before being promoted to production.\n",
    "\n",
    "3. **Production**: These versions are actively serving users in live environments.\n",
    "   - The \"Production\" tag refers to versions of software or code that are actively running in a live environment and serving end-users or customers.\n",
    "   - Production versions are expected to be stable, performant, and reliable, as they are handling real-world traffic and interactions.\n",
    "   - Changes to production versions often follow strict release procedures and may involve deployment strategies such as blue-green deployment or canary releases to minimize disruptions.\n",
    "  \n",
    "\n",
    "<img src=\"img/model_management.PNG\">\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.13"
  },
  "toc": {
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  },
  "varInspector": {
   "cols": {
    "lenName": 16,
    "lenType": 16,
    "lenVar": 40
   },
   "kernels_config": {
    "python": {
     "delete_cmd_postfix": "",
     "delete_cmd_prefix": "del ",
     "library": "var_list.py",
     "varRefreshCmd": "print(var_dic_list())"
    },
    "r": {
     "delete_cmd_postfix": ") ",
     "delete_cmd_prefix": "rm(",
     "library": "var_list.r",
     "varRefreshCmd": "cat(var_dic_list()) "
    }
   },
   "types_to_exclude": [
    "module",
    "function",
    "builtin_function_or_method",
    "instance",
    "_Feature"
   ],
   "window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
